100 research outputs found

    Parsing Large XES Files for Discovering Process Models: A Big Data Problem

    Get PDF
    Process mining is a group of techniques for retrieving de-facto models using system traces. Discovering algorithms can obtain mathematical models exploiting the information contained into list of events of activities. Completeness of the traces is relevant for the accuracy of the final results. Noiseless traces appear as an ideal scenario. The performance of the algorithms is significant reduce if the log files are not processed efficiently. XES is a logical model for process logs stored in data centric xml files. In real processes the sizes of the logs increase exponentially. Parsing XES files is presented as a big data problem in real scenarios with dense traces. Lazy parsers and DOM models are not enough appropriate in scenarios with large volumes of data. We discuss this problematic and how to use indexing techniques for retrieving useful information for process mining. An XES compression schema is also discussed for reducing the index construction time

    Optimized Indexes for Data Structured Retrieval

    Get PDF
    The aim of this work is to show the novel index structure based suffix array and ternary search tree with rank and select succinct data structure. Suffix arrays were originally developed to reduce memory consumption compared to a suffix tree and ternary search tree combine the time efficiency of digital tries with the space efficiency of binary search trees. Rank of a symbol at a given position equals the number of times the symbol appears in the corresponding prefix of the sequence. Select is the inverse, retrieving the positions of the symbol occurrences. These operations are widely used in information retrieval and management, being the base of several data structures and algorithms for text collections, graphs, trees, etc. The resulting structure is faster than hashing for many typical search problems, and supports a broader range of useful problems and operations. There for we implement a path index based on those data structures that shown to be highly efficient when dealing with digital collection consist in structured documents. We describe how the index architecture works and we compare the searching algorithms with others, and finally experiments show the outperforms with earlier approaches

    Evaluating the quality of linked open data in digital libraries

    Get PDF
    Cultural heritage institutions have recently started to share their metadata as Linked Open Data (LOD) in order to disseminate and enrich them. The publication of large bibliographic data sets as LOD is a challenge that requires the design and implementation of custom methods for the transformation, management, querying and enrichment of the data. In this report, the methodology defined by previous research for the evaluation of the quality of LOD is analysed and adapted to the specific case of Resource Description Framework (RDF) triples containing standard bibliographic information. The specified quality measures are reported in the case of four highly relevant libraries.This work has been partially supported by the ECLIPSE-UA RTI2018-094283-B-C32 (Spanish Ministry of Education and Science)

    Application of Data Mining techniques to identify relevant Key Performance Indicators

    Get PDF
    Currently dashboards are the preferred tool across organizations to monitor business performance. Dashboards are often composed of different data visualization techniques, amongst which are Key Performance Indicators (KPIs) which play a crucial role in quickly providing accurate information by comparing current performance against a target required to fulfill business objectives. However, KPIs are not always well known and sometimes it is difficult to find an appropriate KPI to associate with each business objective. In addition, Data Mining techniques are often used when forecasting trends and visualizing data correlations. In this paper we present a new approach to combining these two aspects in order to drive Data Mining techniques to obtain specific KPIs for business objectives in a semi-automated way. The main benefit of our approach is that organizations do not need to rely on existing KPI lists or test KPIs over a cycle as they can analyze their behavior using existing data. In order to show the applicability of our approach, we apply our proposal to the fields of Massive Open Online Courses (MOOCs) and Open Data extracted from the University of Alicante in order to identify the KPIs.This work has been funded by the Spanish Ministry of Economy and Competitiveness under the project Grant SEQUOIA-UA (TIN2015-63502-C3-3-R). Alejandro Maté is funded by the Generalitat Valenciana (APOSTD/2014/064)

    Adding value to Linked Open Data using a multidimensional model approach based on the RDF Data Cube vocabulary

    Get PDF
    Most organisations using Open Data currently focus on data processing and analysis. However, although Open Data may be available online, these data are generally of poor quality, thus discouraging others from contributing to and reusing them. This paper describes an approach to publish statistical data from public repositories by using Semantic Web standards published by the W3C, such as RDF and SPARQL, in order to facilitate the analysis of multidimensional models. We have defined a framework based on the entire lifecycle of data publication including a novel step of Linked Open Data assessment and the use of external repositories as knowledge base for data enrichment. As a result, users are able to interact with the data generated according to the RDF Data Cube vocabulary, which makes it possible for general users to avoid the complexity of SPARQL when analysing data. The use case was applied to the Barcelona Open Data platform and revealed the benefits of the application of our approach, such as helping in the decision-making process.This work was supported in part by the Spanish Ministry of Science, Innovation and Universities through the Project ECLIPSE-UA under grant RTI2018-094283-B-C32

    Reusing digital collections from GLAM institutions

    Get PDF
    For some decades now, Galleries, Libraries, Archives and Museums (GLAM) institutions have published and provided access to information resources in digital format. Recently, innovative approaches have appeared such as the concept of Labs within GLAM institutions that facilitates the adoption of innovative and creative tools for content delivery and user engagement. In addition, new methods have been proposed to address the publication of digital collections as data sets amenable to computational use. In this article, we propose a methodology to create machine actionable collections following a set of steps. This methodology is then applied to several use cases based on data sets published by relevant GLAM institutions. It intends to encourage institutions to adopt the publication of data sets that support computationally driven research as a core activity.This work has been partially supported by ECLIPSE-UA RTI2018-094283-B-C32 (Spanish Ministry of Education and Science)

    A benchmark of Spanish language datasets for computationally driven research

    Get PDF
    In the domain of Galleries, Libraries, Archives and Museums (GLAM) institutions, creative and innovative tools and methodologies for content delivery and user engagement have recently gained international attention. New methods have been proposed to publish digital collections as datasets amenable to computational use. Standardised benchmarks can be useful to broaden the scope of machine-actionable collections and to promote cultural and linguistic diversity. In this article, we propose a methodology to select datasets for computationally driven research applied to Spanish text corpora. This work seeks to encourage Spanish and Latin American institutions to publish machine-actionable collections based on best practices and avoiding common mistakes.This research has been funded by the AETHER-UA (PID2020-112540RB-C43) Project from the Spanish Ministry of Science and Innovation

    Estudio y planificación de contenidos, materiales y metodologías docentes según el EEES:Itin. Gest. Contenidos. 4º Ingeniería Multimedia

    Get PDF
    La red pretende servir de plataforma para la coordinación de las asignaturas de cuarto curso que constituyen el itinerario de Gestión de Contenidos correspondiente al grado de Ingeniería Multimedia para la ejecución de una metodología basada en proyectos. La red ha detectado los problemas de coordinación asociados a gestionar simultáneamente proyectos de naturaleza variada y las posibilidades de conexión de estos mismos con las asignaturas de Prácticas de empresa I y II y la elaboración del Trabajo de Fin de Grado. Se proponen soluciones para paliar estos problemas y se explica el alcance de los objetivos a conseguir

    The James Webb Space Telescope Mission

    Full text link
    Twenty-six years ago a small committee report, building on earlier studies, expounded a compelling and poetic vision for the future of astronomy, calling for an infrared-optimized space telescope with an aperture of at least 4m4m. With the support of their governments in the US, Europe, and Canada, 20,000 people realized that vision as the 6.5m6.5m James Webb Space Telescope. A generation of astronomers will celebrate their accomplishments for the life of the mission, potentially as long as 20 years, and beyond. This report and the scientific discoveries that follow are extended thank-you notes to the 20,000 team members. The telescope is working perfectly, with much better image quality than expected. In this and accompanying papers, we give a brief history, describe the observatory, outline its objectives and current observing program, and discuss the inventions and people who made it possible. We cite detailed reports on the design and the measured performance on orbit.Comment: Accepted by PASP for the special issue on The James Webb Space Telescope Overview, 29 pages, 4 figure
    corecore